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ABSTRACT 

Motivation: The lllumina Infinium 450 k DNA Methylation Beadchip is a 
prime candidate technology for Epigenome-Wide Association Studies 
(EWAS). However, a difficulty associated with these beadarrays is that 
probes come in two different designs, characterized by widely differ- 
ent DNA methylation distributions and dynamic range, which may bias 
downstream analyses. A key statistical issue is therefore how best to 
adjust for the two different probe designs. 

Results: Here we propose a novel model-based intra-array normal- 
ization strategy for 450 k data, called BMIQ (Beta Mixture Quantile 
dilation), to adjust the beta-values of type2 design probes into a stat- 
istical distribution characteristic of type1 probes. The strategy involves 
application of a three-state beta-mixture model to assign probes to 
methylation states, subsequent transformation of probabilities into 
quantiles and finally a methylation-dependent dilation transformation 
to preserve the monotonicity and continuity of the data. We validate 
our method on cell-line data, fresh frozen and paraffin-embedded 
tumour tissue samples and demonstrate that BMIQ compares favour- 
ably with two competing methods. Specifically, we show that BMIQ 
improves the robustness of the normalization procedure, reduces the 
technical variation and bias of type2 probe values and successfully 
eliminates the type1 enrichment bias caused by the lower dynamic 
range of type2 probes. BMIQ will be useful as a preprocessing step for 
any study using the lllumina Infinium 450 k platform. 
Availability: BMIQ is freely available from http://code.google. 
com/p/bmiq/. 
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Supplementary information: Supplementary data are available at 
Bioinformatics online 
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1 INTRODUCTION 

In the past few years, the field of epigenomics has risen to prom- 
inence (Feinberg, 2010; Petronis, 2010). Epigenomics not only 
offers an improved understanding of fundamental biological 
processes such as cellular differentiation and early embryogenesis, 
but is also widely recognized to be key in understanding 
the pathogenesis of complex genetic diseases like cancer 
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(Baylin and Ohm, 2006; Feinberg et al, 2006; Jones and Bayhn, 
2007). One particular epigenetic mark of interest is DNA methy- 
lation. Indeed, DNA methylation markers have been proposed as 
early detection, diagnostic and prognostic markers in a wide range 
of different diseases (Rakyan et al., 2011). Underpinning this 
increased interest in epigenomics are significant advances in bea- 
darray technology, which now allow routine measurement of 
DNA methylation at over thousands of CpG dinucleotides 
(Bibikova et al, 2009, 2011; Sandoval et al, 2011). Among 
these, the lllumina Infmium 450 k Human Methylation 
Beadchip offers both scalability and coverage (>480 000 probes) 
and is thus suitable for Epigenome-Wide Association Studies 
(EWAS) (Dedeurwaerder et al, 2011; Rakyan et al, 2011; 
Sandoval et a/., 2011). 

A key statistical issue with the lllumina 450 k beadchip is that 
probes come in two different designs, which causes the methyla- 
tion values derived from these two designs to exhibit widely dif- 
ferent distributions (Dedeurwaerder et al, 2011). Indeed, type2 
probes are typically characterized by a much lower dynamic 
range compared with typel probes, even after adjustment for dif- 
ferences in biological characteristics such as CpG density 
(Dedeurwaerder et al, 201 1). Comparison with bisulphite pyrose- 
quencing data further showed that type2 probe values are biased 
and generally less reproducible (Dedeurwaerder et al, 2011). To 
correct for this bias, a peak-based correction (PBC) method was 
proposed (Dedeurwaerder et al, 2011) which normahses type2 
design probes so as to render them comparable with typel 
probes. Making the statistical distributions of typel and type2 
probes comparable is important for several reasons. Not doing 
so may introduce an enrichment bias towards typel probes when 
ranking probes in supervised analyses, as the dynamic range of 
typel probes is significantly higher. Moreover, methods that seek 
to determine differentially methylated regions (Jaffe et al, 2012) 
also assume that probes within these regions are comparable and 
thus one would want to avoid any sources of technical variation 
within them. Finally, one would wish to apply unsupervised di- 
mensional reduction algorithms (Houseman et al, 2008; Koestler 
et al, 2010) and classification algorithms (Zhuang et al, 2012) to 
one single dataset, and not separately to two different assays. 

Although the PBC method was vahdated in one dataset 
(Dedeurwaerder et al, 20\\) and has now been implemented in 
a pipeline for 450 k data (Wang et al, 2012), two recent studies 
have exposed potential problems with PBC, specially when 
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applied to tissue samples (Maksimovic et al., 2012; Touleimat 
and Tost, 2012). In fact, as noted in these studies, PBC breaks 
down when the methylation density distribution does not exhibit 
well-defined peaks/modes. Hence, both studies proposed subset 
quantile normalization methods (SQN and SWAN) to correct 
for the type2 bias and which avoid the pitfalls of PBC 
(Maksimovic et al, 2012; Touleimat and Tost, 2012). In this 
work, we show that PBC often leads to discontinuities ('holes') 
in the type2 density distribution. To address this problem, we 
here propose a novel mixture model-based normalization algo- 
rithm, called Beta Mixture Quantile dilation (BMIQ). We sub- 
ject BMIQ to a rigorous evaluation using numerous independent 
datasets and using a number of different evaluation criteria to 
assess its robustness and performance. Specifically, we assess 
BMIQ in terms of reducing (i) the technical variance, (ii) the 
type2 bias, (iii) and the above-mentioned typel enrichment 
bias. We further benchmark BMIQ against PBC and SWAN. 
For assessing technical variance and to allow a comprehensive 
comparison of BMIQ to PBC/SWAN across many datasets, we 
use in addition to replicates, a novel evaluation framework based 
on using adjacent typel-type2 probe pairs within probe clusters, 
a framework which we show leads to consistent and robust con- 
clusions across 10 independent datasets. We demonstrate that, 
overall, BMIQ compares favourably to PBC and SWAN. 

2 METHODS 

2.1 Biological data: DNA methylation 

Illumina Infinium 450 k DNAm assay The DNA methylation 
data considered in this work were all generated using Illumina's 
Infinium Human Methylation 450 k beadchip. Full details of this tech- 
nology are described in Bibikova et al. (2011) and Sandoval et al. (2011). 
Briefly, the methylation value of each probe follows an approximate ^- 
valued distribution, with /3 constrained to lie between 0 (unmethylated 
locus) and 1 (methylated). This follows from the definition of ^ as the 
ratio of methylated to combined intensity values, i.e 



where U and M are the unmethylated and methylated intensity values of 
the probe (averaged over bead replicates) and e = 100 is a small correc- 
tion term to regularize probes of low total signal intensity (i.e. probes 
with C/ + M ~ 0 after background subtraction). Throughout we used 
non-background-corrected DNAm data. Of the 485 577 probes, 72% 
are of a type2 design in which the U and M measurements are obtained 
in different colour channels, while the rest (28%) of the probes are of the 
old typel design in which both U and M measurements are obtained in 
the same colour channel. Importantly, typel and type2 probes differ sig- 
nificantly in terms of CpG density, with CpGs mapping to CpGs islands 
overrepresented among typel probes (Bibikova et al., 2011; Sandoval 
et al, 2011). 

Datasets 1 and 2: ( BT) and ( CL) This is a subset of the 450 k 
dataset considered in Dedeurwaerder et al. (201 1). We used the data from 
eight fresh frozen (FF) breast tumours and eight normal breast tissue 
specimens [hereafter refered to as (BT)], as well as the three replicates 
from the HCT116 WT cell-line [hereafer refered to as (CL)]. For these 
cell-lines, matched bisulphite pyro sequencing (BPS) data were available 
for nine type2 probes. 

Datasets 3 and 4: (FFPE) and (FF) This 450 k dataset consists 
of 32 formalin-fixed paraffin-embedded (FFPE) head and neck cancers 



(HNCs), of which 18 were HPV+ and 14 HPV-, as well as five fresh 
frozen HNCs (FF), of which 2 were HPV+ and 3 HPV-. The data are 
available from GEO under accession number GSE38271. 

Dataset 5: ( GBM) This 450 k dataset consists of 81 glioblastoma 
multiformes (GBMs) (Turcan et al, 2012), 49 of which were categorized as 
CpG island methylator positive (CIMP+) and 32 as CIMP-. 

Datasets 6-10: TCGA, LIV, LC, BLDC, HCC These 450 k 
samples are all from the TCGA: Specifically, Dataset6 (TCGA) consists 
of 10 samples as provided in the Bioconductor data package 
TCGAmethylation 450 k, Dataset7 (LIV) consists of nine normal liver 
tissue samples from Batch203 in the TCGA data portal, Dataset8 (LC) 
consists of 22 lung cancer samples from Batchl96, Dataset9 (BLDC) 
consists of 12 bladder cancer samples from Batch86 and Dataset 10 
(HCC) consists of 10 hepatocellular carcinoma samples from Batchl53. 

2.2 BMIQ: Beta Mixture Quantile dilation 
normalization strategy 

The normalization of type2 probe values into typel must satisfy the 
following criteria, (i) It must allow for the different biological character- 
istics of typel and type2 probes, i.e typel probes are significantly more 
likely to map to CpG islands than type2 probes, and hence the relative 
proportion of methylated and unmethylated probes will vary between the 
two designs. In the case of the type2 probes, this means that these pro- 
portions must be invariant under the normalization transformation, (ii) 
The transformation of the type2 probe values should reduce the bias, 
which amounts to matching of the density distributions of the two 
design types, specially at the unmethylated and methylated extremes, 
(iii) The transformation must be monotonic, that is, the relative ranking 
of beta values of the type2 probes must be invariant under the transform- 
ation. Next, we propose a normalization strategy for the type2 probes 
satisfying the above properties and which is based on three steps: 

• Fitting of a three-state (unmethylated- U, hemimethylated-H, fully 
methylated-M) beta mixture model to the typel and type2 probes 
separately. For sake of convenience we refer to intermediate allelic 
methylation as hemimethylation even though hemimethylation 
is most often used in the context of strand- specific methylation. 
Let ^{/), ^ir)' (<^M'^m)} denote the parameters of the 
three beta distributions for the typel probes, and similarly let 

(fl^.^i/). (<2m'^m)} describe the estimated parameters of 
the three beta components for the type2 probes. State membership 
of individual probes is determined by the maximum probability 
criterion. 

• For those type2 probes assigned to the U-state, transform their 
probabilities of belonging to the U-state to quantiles using the in- 
verse of the cumulative beta distribution with beta parameters 
{a^jj,b^jj) estimated from the typel U component. Let denote 
the normalized values of the type2 U-probes. 

• For those type2 probes assigned to the M-state, transform their 
probabilities of belonging to the M-state to quantiles using the in- 
verse of the cumulative beta distribution with beta parameters 
{a\^,b{^) estimated from the typel M component. Let ^jj^ denote 
the normalized values of the type2 M-probes. 

• For the type2 probes assigned to the H-state, we perform a dilation 
(scale) transformation to 'fit' the data into the 'gap' with endpoints 
defined by max{^^} and minj^yji^}. 

We next describe each of the above steps in detail. We first model each 
beta value ^ as. 
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where B denotes the beta probability density function and / denotes the 
design type t = (I,II). We infer the parameters (Tt,a,b) using an 
Expectation Maximization (EM) algorithm as described in (Ji et al., 
2005). The estimated parameters we denote again by (nl^al^bl) where t 
labels the design and s one of the three states (U,H,M). The resulting 
means of the estimated beta-distributions are denoted by where 



Further, let Un,Hii,Mjj denote the set of type2 probes assigned to 
unmethylated, hemimethylated or fully methylated states (using the max- 
imum probability criterion), and let Ufj (Ufj) denote the set of Un probes 
with yS-values smaller (larger) than m^. Similarly, let Mfj (Mfj) denote the 
set of Mu probes with yS-values smaller (larger) than m^. This subdivision 
into values which fall left (L) or right (R) of the mean are necessary since 
the state membership probabilities estimated from the EM algorithm are 
two tailed. Next, for the Ufj probes we estimate their type2 tail probabil- 
ities of belonging to the C/-state, i.e p = P(U\^ul) = F(^jjL\a^j^, b^^) where 
F denotes the cumulative distribution beta function. We then transform 
these probabilities back to quantiles (i.e ^0- values), but using the typel 
parameters, i.e 

q = F-'(p\a'^,b'^) (4) 

and finally set the normalized ^0- value, r]jjL = q. An identical transform- 
ation (using 1 - F instead of F) is performed for the Ufj probes. Next, 
we perform the analogous operation for the Mfj and Mfj probes. This 
therefore yields normalized type2 values for all type2 U and M probes. 

Finally, it remains to normalize the type2 H probes. Since the type2 H 
probe value distribution is sandwiched between the U and M probe dis- 
tributions, we can use an empirical approach to normalize these values, 
thus also bypassing the difficulty that type2 H probe values are not well 
described by a beta distribution (Supplementary Fig. SI). Specifically, we 
first identify the minima and maxima of the type2 //-probes, 
maxH = max{/32} and minH = min{y0^}, and let A^^ = maxH — minH. 
We also find the minimum of the M-probes, i.e minM = mm{/3^^} and the 
maximum of the C/-probes, i.e. maxU = max{y0^}. We point out that in 
fact all of these extrema represent robust values, because they do not 
represent extrema on the bounded (0,1) support, i.e. the values 
maxU, minH, max H and minM are not close to 0 or 1. Next, we define 
distances 

AuH = minH — maxJJ 

Ahm = —maxH + minM 

We want the new normalized maximum and minimum values of 
//-probes to satisfy 

nmaxH = min{?7^} - Ahm 

nminH = max{?]{^} -\- Auh 

so that A^^ = nmaxH — nminH. The normalized ^0- values for the 
//-probes is then given by the conformal (shift + dilation) transformation 

ri^^ = nminH + - minH) (5) 

where df = A^^ / A^^ is the dilation factor. It is important to observe that 
the conformal transformation involves a non-uniform rescaling of the 
H probe beta values since it depends on the beta-value of the probe. 
This is absolutely key in order to avoid gaps or holes from emerging in 
the normalized distribution. 

This algorithm is flexible in that the dilation can be performed also 
including the (and/or the C/^)//) probes, which means that the match- 
ing of the density distributions is only done on the respective tails (i.e. the 
Mfj and Ufj probes). We point that in practice we find that the optimal 
performance is attained by including the Mfj probes with the //-probes 



when performing the conformal transformation. This is because we 
observed that it is the left tail end of the methylated type2 distribution 
that is generally not well described by a beta- distribution (Supplementary 
Fig. SI), presumably as a result of the dye bias, which is specific to the 
type2 distribution. 

There are a number of other important points to note about BMIQ: (i) 
First, it is important to choose reasonable initial weight parameters 
{n^lP, 71^^, 71^} in the EM-algorithm. As these fractions can vary signifi- 
cantly from study to study, or even sample to sample depending on the 
nature of the samples assayed, it is important to choose reasonable initial 
values on a per-sample basis. Not doing so may result in mild disconti- 
nuities in the type2 density distribution. To obtain estimates for these 
prior weight parameters, we first note that their estimation only requires 
estimates for the two thresholds used for calling the three states, since the 
weights for a given sample are determined given a choice of thresholds. 
Moreover, although the thresholds will show little inter-sample variabil- 
ity, the weights may not, reflecting the biological differences in the 
number of probes that are unmethylated, hemimethylated or fully methy- 
lated. In BMIQ, the estimation of the initial thresholds proceeds in an 
automatic fashion on a per-sample basis: in detail, we use the estimated 
thresholds from the typel distribution (which always gives an excellent fit. 
Supplementary Fig. SI) to then obtain type2 -specific thresholds using a 
simple correction reflecting the difference in the modes between the typel 
and type2 distributions. Specifically, if is the lower threshold 
(i.e. typel ^ values less than are called unmethylated) and UM^^ 
and UM^^^ are the estimated modes of the unmethylated typel and 
type2 components, the intial prior estimate for would 
be t^lP = fl) +{UM^^^ -UM^^). Similarly, the threshold for calling 
probes fully methylated or just hemi-methylated would be 
= t^^ + (MM^^^ - MM^^) where MM denotes the mode of the 
methylated state. We note that resulting thresholds would normally 
fall within the ranges 0.2-0.3 and 0.60-0.8, respectively. Having thus 
identified reasonable initial estimates for the weights {ti^^^, ti^^, ti^}, 
the algorithm will then automatically determine the unmethylated, hemi- 
methylated and methylated fractions for each sample individually, 
(ii) A second important observation is the overall robustness of BMIQ 
to the goodness of the type2 EM-fit. This is important, since we consis- 
tenly observe that the methylated type2 distribution is not well described 
by a beta function (Supplementary Fig. SI). In this regard we have 
also verified that using a beta mixture model with more than three 
states does not improve the overall type2 fit. Fortunately however, as 
explained above, the goodness of fit problem associated with the H 
and M-probes can be easily circumvented by modelling only the right 
tail of the methylated component as the corresponding tail of a beta 
distribution. In this case, the left tail is modelled together with the H- 
probes using the observed empirical distribution. Hence, the probe values 
that are not well described by a beta distribution are not normalized using 
estimated beta parameters, which means that their normalization is in- 
sensitive to the goodness of fit. 

3 RESULTS 

3.1 Improved robustness of BMIQ 

To vahdate BMIQ, we first apphed it to data where the PBC 
method has been shown to work reasonably well. Thus, we 
applied it to a fresh frozen breast tumour sample from 
Datasetl (Dedeurwaerder et aL, 2011) (Fig. lA). We can see 
that for this particular sample, the methylated typel peak is 
well defined and as a result both PBC and BMIQ appear to do 
well in generating smooth density distributions for the type2 
probes, which at the methyl ation extremes are also reasonably 
well matched to the typel density distribution. Next, we applied 
both PBC and BMIQ to the FFPE tumour samples from 
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DatasetS, for which the typel methylated peak was not well 
defined (Fig. IB). In these samples, PBC generated a type2 dens- 
ity distribution that exhibited relatively sharp changes ('holes') 
(Fig. IB), suggestive of a non-optimal adjustment and indicating 
that in such cases PBC breaks down. This is not suprising since 
PBC relies heavily on the ability to detect clear unmethylated and 
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Fig. 1. (A) Density distributions of beta-values for the typel probes, 
type2 probes (unnormalized) and normalized type2 probes for a breast 
tumour sample from Datasetl. (B) Density distributions of beta- values 
for the typel probes, type2 probes (unnormalized) and normalized type2 
probes for a head and neck tumour sample from Dataset3. Left panels 
are for PBC, right panels for BMIQ 
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Fig. 2. (A) Boxplots comparing the standard deviation (y-axis) of typel 
(1) and type2 (2) probes over the three HCT116 WT replicates from 
(Dedeurwaerder et al., 2011), for the case of no design normalization 
(NONE), PBC and BMIQ. (B) As (A) but now comparing the 
Manhattan distances over type2 probes only for each pair of replicates 



methylated modes in the typel density distribution in order to 
then adjust the type2 distribution accordingly. Importantly, 
BMIQ does not use the typel modes to adjust the type2 data, 
and hence BMIQ normalization of the type2 probes generated a 
much smoother density distribution, suggestive of an improved 
normahzation framework (Fig. IB). Moreover, the tail ends of 
the BMIQ type2 distribution better matched those of the typel 
distribution without affecting the fractions of unmethylated, 
hemimethylated and fully methylated probes, which are pre- 
served by the BMIQ transformation. 

3.2 BMIQ reduces technical variation 

To further test BMIQ we applied it to Dataset2 (CL) consisting 
of three rephcates of a given cell-line, to investigate if reprodu- 
cibihty is improved. First, we computed for each of the probes its 
standard deviation across the three rephcates and for each of the 
three scenarios: no normalization, PBC and BMIQ. As seen, 
BMIQ performed similarly to PBC and led to a significant re- 
duction in inter-rephcate variabihty (Fig. 2A). To check this fur- 
ther, we compared the normahzation methods in terms of the 
Euclidean distance between the three possible pairs of rephcates 
across the type2 probes (Fig. 2B). Using this measure, BMIQ not 
only led to a significant improvement, but was also marginally 
better than PBC (Fig. 2B). 

3.3 BMIQ reduces bias of typel methylation values 

Using replicates to evaluate normalization methods assesses the 
method in terms of reducing technical variability but does not 
evaluate whether the actual values of the replicates are closer to 
the true estimate. This requires comparison with a gold- standard, 
which is provided by matched BPS data (Dedeurwaerder et al., 
2011). Hence, we compared the methods in terms of the devi- 
ations from BPS methylation values for the nine type2 probes in 
Dedeurwaerder et al. (2011) for which matched 450 k BPS data 
were available. Similar to PBC, we observed that BMIQ signifi- 
cantly reduced the bias of type2 values (Fig. 3), although there 
was no improvement over PBC itself, presumably owing to the 
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Fig. 3. Barplot comparing the maximum and mean absolute deviation of 
450 k values from their corresponding bisulphite pyrosquencing values 
over the nine type2 probes considered in Dedeurwaerder et al., (2011). 
We compare these deviation measures for the case of no design normal- 
ization (NONE), PBC normalization and BMIQ 
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fact that in these specific samples the methylated typel peak was 
well defined, a scenario in which PBC works well. 

3.4 BMIQ eliminates the typel enrichment bias 

To further test BMIQ, we considered the supervised context, in 
which a ranked Hst of probes correlating with a phenotype of 
interest is derived. Given the higher dynamic range of typel 
probes, one expects that this would favour typel probes and 
that therefore there would be a relative over enrichment of 
typel over type2 probes in a top ranked Hst of probes. 
However, one key difficulty when assessing whether there is a 
bias towards typel probes is that typel and type2 probes differ 
significantly in terms of their biological characteristics, in par- 
ticular in terms of CpG density. Hence, in order to avoid con- 
founding by CpG density, we only selected probes that mapped 
to CpG islands and to 200 bp upstream of the TSS, thus allowing 
a sensible comparison between typel and type2 probes. We con- 
sidered three different datasets and derived for each a ranked list 
of probes associated with a phenotype of interest: breast cancer 
versus normal breast [Datasetl (BT)], HPV + versus HPV-HNC 
[Datasets (FFFE)], and CIMP+ versus CIMP- (GBM) 
(DatasetS). The ranking was performed using the magnitude of 
differential methylation. Although this ranking does not take the 
within-phenotype variability into account, it remains a popular 
method (Dedeurwaerder et al, 201 1; Du et aL, 2010), and for our 
purposes, using the absolute difference in beta-values allows us 
to better interpret the performance of the different normalization 
methods. To assess any potential bias towards typel probes, we 
computed for a given number of top ranked probes the odds 
ratio (OR) of relative enrichment of typel over type2 probes. 
Across all three datasets, we indeed observed a bias towards 
typel probes, although the severity of this bias varied substan- 
tially from study to study (Fig. 4). Using PBC, in one dataset this 
bias was eliminated; however, in the other two datasets, PBC 
overcorrected the data leading to a bias favouring type2 



probes. In contrast, BMIQ eliminated the typel enrichment 
bias in all three datasets (the resulting OR was always close 
to 1) without overcorrecting the data and avoiding the type2 
enrichment bias seen for PBC. 

3.5 Reduced technical variability within probe clusters 

To further assess BMIQ, we devised an evaluation framework 
which exploits the well known spatial correlation of DNA 
methylation at scales <500bp (Eckhardt et al., 2006). 
Approximately 27% of the 450 k probes fall into 12 501 probe 
clusters, defined as contiguous regions containing at least seven 
probes with no two adjacent probes separated by >300 bp (Jaffe 
et al., 2012). Within these probe clusters, we posited that pairs of 
adjacent probes, one from each design and within 200 bp of each 
other, should have similar methylation values. Among the 12 501 
probe clusters we identified on the order of '^SOOOO of such 
adjacent typel-type2 probe pairs. Thus, to evaluate the normal- 
ization algorithms, we asked which one minimizes the absolute 
difference in methylation between such closely adjacent typel- 
type2 pairs. We considered a total of 10 independent datasets, 
seven of which had idat files, thus allowing also for a direct 
comparison with SWAN (Maksimovic et al., 2012). For each 
dataset, we computed the mean of the absolute deviations over 
probe pairs and samples. Comparison of these average deviations 
revealed that BMIQ consistently reduced the technical variation, 
while also outperforming PBC and SWAN (Table 1). In fact, in 
9 of 10 datasets, BMIQ was substantially better as assessed using 
a paired Wilcoxon rank sum test over all probe pairs and samples 
(Table 1). Example methylation profiles within these probe 
clusters confirmed that BMIQ successfully reduces the technical 
variability, while PBC can break down either overcorrecting 
or suppressing the type2 data values, leading to substantial dif- 
ferences in methylation between neighbouring probes, even at 
scales of <100bp (Fig. 5 and Supplementary Table SI). 
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Fig. 4. OR of relative enrichment of typel probes over type2 probes among the top 100, top 250, top 500 and top 1000 ranked probes, where probes were 
ranked according to the absolute difference in methylation (beta- values). ORs are shown for the case of no design normalization (NONE), PBC and BMIQ 
normalization. Supervised analysis and ranking was performed only on probes mapping to CpG islands and within 200 bp upstream of transcription start 
site to correct for biological differences between typel and type2 probes. The line OR= 1 represents the ideal scenario of no relative enrichment of typel 
versus type2 probes. The 95% confidence envelope around OR= 1 is shown to assess significant deviations from OR= 1. (A) Eight breast cancers versus 
eight normal breast (Datasetl), (B) 18 HPV+ HNCs versus 14 HPV- HNCs (DatasetS), (C) 49 CIMP+ GBMs versus 32 CIMP- GBMs (Dataset5) 
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3.6 BMIQ robustly identifies features associated 
with HPV status 

Finally, it must be verified that the reduction in technical vari- 
ance obtained with BMIQ is not at the expense of a reduced 
biological signal. Since it is difficult to establish what constitutes 
a true positive, we used a training test set strategy, to identify 
features in a training set and calling them true positives if vali- 
dated in a test set. This strategy thus allows for a comparison 

Table 1. For each dataset we compare the absolute deviation in methy- 
lation between adjacent typel-type2 probe pairs (probes within 200 bp of 
each other), averaged over probe pairs and samples, for four different 
normalisation methods 



Dataset 


NONE 


PBC 


SWAN 


BMIQ 


P 




(%) 


(%) 


(%) 


(%) 




BT 


7.8 


6.3 


NA 


6.2 


<io-^^ 


CL 


8.6 


18.4 


NA 


7.2 


<io-^^ 


FFPE 


9.2 


8.0 


8.5 


7.8 


<io-^^ 


FF 


8.5 


8.1 


7.6 


7.3 


<io-^^ 


GBM 


9.2 


7.6 


NA 


7.5 


<io-^^ 


TCGA 


9.4 


7.8 


8.3 


7.4 


<io-^^ 


LIV 


10.0 


6.3 


7.4 


6.4 


~1 


LC 


10.3 


7.0 


7.7 


6.7 


<io-^^ 


BLDC 


11.0 


8.0 


7.9 


7.6 


<io-^^ 


HCC 


12.0 


8.5 


8.7 


8.1 


<io-^^ 



NONE refers to the case of no adjustment for probe design type. The last column 
give the paired Wilcoxon rank sum test P-value (treating each probe-pair deviation in 
each sample as a separate value), assessing the statistical significance that the absolute 
deviation for BMIQ is smaller than the next best competing method. NA indicates 
non-available owing to lack of access to idat files needed for processing by SWAN. 
In bold-face we show the smallest deviation across methods. 



of sensitivity and positive predictive value (PPV) between the 
different normalization methods. To perform this analysis, 
we used Dataset4(FF) consisting of 2 HPV+ and 3 HPV— 
fresh frozen head and neck cancers to derive features associated 
with HPV status. As test set we used Dataset3(FFPE) consisting 
of 18 HPV+ and 14 HPV- head and neck cancers (FFPE 
tissue). Using limma (Smyth, 2005) and an FDR threshold 
of 0.35, we observed that BMIQ identified substantially more 
differentially methylated features than PBC or SWAN 
(Table 2). Importantly, this was not at the expense of a smaller 
PPV, and so, overall, BMIQ identified substantially more 
true positives (Table 2). 

4 DISCUSSION 

In this work we have presented a novel mixture-model-based 
algorithm (BMIQ) for correcting the bias associated with 
type2 probe values in 450 k studies. Confirming the observations 



Table 2. Table listing the number of differentially methylated probes 
(nDMPs) associated with HPV status in Dataset4 (FF), and the corres- 
ponding estimates for the positive predictive value (PPV) and number of 
true positives (nTPs) estimated using Dataset3 (FFPE) as test set 



Metric 


NONE 


PBC 


SWAN 


BMIQ 


nDMP 


51 


70 


41 


252 


PPV 


0.25 


0.18 


0.19 


0.20 


nTP 


13 


13 


8 


51 



DMPs were defined at an FDR threshold of 0.35, and those with the same sign of 
limma r-statistic in the two sets and with a corresponding P-value<0.01 in the test 
set were deemed true positives. 





44952500 

Genoma Position (Chr19) 




166953300 166953500 
Gsnome Position {CHr!} 



79298200 79296400 79293600 
Ganoma Position {Chr15} 



Fig. 5. Examples of methylation profiles, from Dataset4(FF), of three probe clusters on chromosomes 19, 1 and 15, respectively. The design type of each 
probe is indicated with 1 and 2. The non design normalized data (NONE), PBC, SWAN and BMIQ corrected data are superimposed. Observe how 
across the three loci, BMIQ generally corrects the data in a way which is more consistent with the neighbouring typel values. In the left panel PBC 
overcorrects the data, in the right panel there is suppression, while in the middle panel PBC both overcorrects and suppresses beta values. The data values 
can be found in Supplementary Table SI 



194 



Adjusting design bias in 450k studies 



made in Touleimat and Tost (2012) and Maksimovic et al. 
(2012), we have seen that PBC can break down in samples 
with ill-defined typel methylation peaks, causing sharp, almost 
discontinuous changes (which we call 'holes') in the density dis- 
tributions (Fig. IB), which motivated our quest to find a more 
robust algorithm. We have shown that BMIQ improves the 
robustness and can successfully normalize the type2 distribution, 
avoiding the appearance of such 'holes' (Fig. IB). Moreover, 
BMIQ successfully matches the tail-ends of the typel and 
type2 distributions, while faithfully preserving the proportions 
of unmethylated and methylated probes within each of the 
two designs. 

To further test BMIQ, we used data on technical replicates (to 
show that it reduces technical variability) and matched BPS data 
(to show that it reduces the bias of type2 probe values). Using 
these criteria, we have seen that BMIQ leads to significant im- 
provements, similar to the improvements noted for PBC (Figs 2 
and 3). In relation to these evaluation criteria, it is worth point- 
ing out that BMIQ was compared with PBC on samples with 
well-defined typel methylation peaks, i.e. on data that were used 
to develop PBC itself. Hence, it is likely that an evaluation of 
technical reproducibility (using replicates) and type2 value bias 
(using matched BPS data) on data where the methylated typel 
peaks are less well-defmed would favour BMIQ over PBC. 
However, we did not have access to technical replicates or 
matched BPS data in the other specific datasets considered 
here. Therefore, in order to further assess BMIQ, we devised a 
supervised framework across three independent datasets to ob- 
jectively compare the algorithms in their ability to reduce the 
expected enrichment bias of typel probes. First, we showed 
that if no design normalization is performed then there is 
indeed an enrichment bias towards typel probes, even when ad- 
justed for CpG density (Fig. 4). We also showed that in two 
datasets, PBC overcorrected the type2 data, leading to an over- 
inflated dynamic range, thus favouring type2 probes and causing 
an 'overshooting' of the enrichment scores, reflected by a signifi- 
cant underenrichment of typel probes (Fig. 4B and C). In con- 
trast, BMIQ successfully avoided any typel/type2 enrichment 
bias in all three datasets, indicative of an improved normaliza- 
tion of type2 values (Fig. 4). We should point out that the over- 
correction of type2 values and the associated overinflated 
dynamic range caused by PBC is consistent with the presence 
of 'holes' in the hemimethylated region of the type2 density dis- 
tribution. Thus, with PBC there is an artificial expulsion of data 
points from the hemimethylated region to the unmethylated and/ 
or methylated extremes. In a further assessment of BMIQ, we 
conducted a detailed spatial analysis of DNA methylation at the 
level of probe clusters across 10 independent datasets. By care- 
fully analysing adjacent typel-type2 probe pairs, we observed 
that PBC can often overcorrect or suppress the data (in some 
cases inducing abnormally large 30% changes in methylation), in 
contrast to BMIQ, which normalized type2 values in a way that 
rendered them more consistent with the values of neighbouring 
typel probes (Fig. 5 and Table 1). Interestingly, BMIQ also 
appeared to outperform SWAN (Table 1), which is part of the 
popular and widely used minfi package (Hansen and Aryee, 
2012). Of note, the reduction in technical variance achieved 
by BMIQ was not at the expense of a lower biological 
signal (Table 2). 



In summary, using a number of different evaluation criteria 
and numerous datasets, we have seen that BMIQ compares fa- 
vourably with both PBC and SWAN. Although we did not com- 
pare BMIQ to SQN (Touleimat and Tost, 2012), this latter 
method is very similar to SWAN, as they both rely on a probe 
subset quantile normalization. Like SQN/SWAN, BMIQ uses 
quantiles to normalize the type2 probe values into a distribution 
that is comparable with that of typel probes. However, unlike 
SQN and SWAN, BMIQ is based on an explicit beta-mixture 
modeling framework, and uses state-membership probabilities 
under this beta mixture model to reassign the quantiles of 
the type2 probes according to the typel distribution. Thus, 
BMIQ is assumption-free, as it does not require a separate 
normalization to be performed on selected subsets of probes 
that are matched for biological characteristics (e.g. CpG density), 
as done in SQN and SWAN. In fact, under the BMIQ frame- 
work, all the biological differences (including CpG density) 
between the typel and type2 probes are captured by the esti- 
mated fractions of unmethylated, hemimethylated and methy- 
lated probes, which will be different between the two assays. 
Thus, BMIQ does not depend on a priori and somewhat 
arbitrary choices of which biological characteristics to use 
when matching the typel and type2 distributions. For instance, 
in SQN the normalization is performed on probe subsets defined 
by specific CpG characteristics (e.g. shelves, shores, CpG is- 
lands); however, multiple different definitions for say CpG is- 
lands exist (Gardiner- Garden and Frommer, 1987; Takai and 
Jones, 2002; Wu et al, 2010; Zhao et al, 2009). Similarly, in 
SWAN the number of CpGs in the probe body, even if they 
differ by one, is used to define probe normalization categories, 
and thus it is unclear whether these probe categories represent an 
optimal way of dividing the probes up. Therefore, we see the 
beta-mixture model framework of BMIQ as an important con- 
ceptual advantage over SQN/SWAN, since, as demonstrated 
here, it successfully normalises type2 probe values, faithfully pre- 
serving the numerous and complex biological differences that 
exist between the two designs without ever needing to define 
probe subsets. Nevertheless, it will be interesting to conduct a 
comprehensive and detailed comparison of BMIQ, SQN and 
SWAN on matched 450 k BPS data on a sufficiently large 
number of loci and samples. 



5 CONCLUSIONS 

We have presented a mixture model assumption-free normaliza- 
tion algorithm, BMIQ, which will be useful for correcting the 
bias associated with the type2 assay in DNA methylation studies 
using the lUumina Infinium 450 k platform. 
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